Font Acknowledgment and Character Extraction of Digital and Scanned Images

نویسنده

  • Syed Muhammad Arsalan Bashir
چکیده

The font recognition and character extraction is of immense importance as these are many scenarios where data are in such a form, which cannot be processed like in image form or as a hard copy. So the procedure developed in this paper is basically related to identifying the font (Times New Roman, Arial and Comic Sans MS) and afterwards recovering the text using simple correlation based method where the binary templates are correlated to the input image text characters. All of this extraction is done in the presence of a little noise as images may have noisy patterns due to photocopying. The significance of this method exists in extraction of data from various monitoring (Surveillance) camera footages or even more. The method is developed on Matlab© which takes input image and recovers text and font information from it in a text file. General Terms Pattern Recognition, Font Extraction, Image Processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Font group identification using reconstructed fonts

Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are no...

متن کامل

Least-Squares Font Metric Estimation from Images1

Gary E. Kopec2 Xerox Palo Alto Research Center June 15, 1993 Abstract Character placement in digital typography is commonly defined in terms of a set of parameters (font metrics) that specify the relative locations of the local coordinate systems of adjacent characters in a line of text. The primary font metrics for horizontal letterspacing are the character set widths and sidebearings and the ...

متن کامل

A FILTERED B-SPLINE MODEL OF SCANNED DIGITAL IMAGES

We present an approach for modeling and filtering digitally scanned images. The digital contour of an image is segmented to identify the linear segments, the nonlinear segments and critical corners. The nonlinear segments are modeled by B-splines. To remove the contour noise, we propose a weighted least q m s model to account for both the fitness of the splines as well as their approximate cur...

متن کامل

A Font and Size Independent Content Based Retrieval System for Kannada Document Images

This paper presents a Content based image retrieval system for Kannada Document images. Given a query word, the system returns the documents in the database in which there is a similar word, with the word highlighted. The retrieval works for Kannada document images which have different font sizes and styles. First the scanned Kannada document images are preprocessed to reduce image noise. Then ...

متن کامل

Kannada Text Extraction from Images and Videos Forvision Impaired Persons

We propose a system that reads the Kannada text encountered in natural scenes with the aim to provide assistance to the visually impaired persons of Karnataka state. This paper describes the system design and standard deviation based Kannada text extraction method. The proposed system contain three main stages text extraction, text recognition and speech synthesis. This paper concentrated on te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1305.4064  شماره 

صفحات  -

تاریخ انتشار 2013